
Open-vocabulary 3D object detection (OV-3Det) aims to generalize beyond the limited number ofbasecategories labeled during thetraining phase. Thebiggest bottleneck is the scarcity of annotated 3D data, whereas 2D image datasets are abundantandrichlyannotated.